AITopics | dslr photo

Collaborating Authors

dslr photo

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Supplementary Material for DreamHuman: Animatable 3D Avatars from Text This document contains additional details and experiments that did not fit in the main text due to

Neural Information Processing SystemsNov-14-2025, 05:54:41 GMT

For animations and additional results please also check the included videos. We use a similar optimization strategy with DreamFusion, so unless otherwise noted the hyperparam-eters remain the same. DreamFusion we also train on a TPUv4 machine with 4 chips. We increase the number of optimization iterations from 15,000 to 50,000. We did not observe any significant benefits by training for more iterations.

artificial intelligence, geometry, machine learning, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (0.70)
Information Technology > Artificial Intelligence > Machine Learning (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.35)

Add feedback

92ce40962b4098f7bf6eed33128fc606-Paper-Conference.pdf

Neural Information Processing SystemsOct-10-2025, 09:51:02 GMT

dslr photo, optimization, tet-splatting, (13 more...)

Neural Information Processing Systems

Country:

Europe (0.14)
Asia > China (0.14)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry:

Information Technology (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Supplementary Material for DreamHuman: Animatable 3D Avatars from Text This document contains additional details and experiments that did not fit in the main text due to

Neural Information Processing SystemsOct-8-2025, 06:48:40 GMT

artificial intelligence, geometry, machine learning, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (0.70)
Information Technology > Artificial Intelligence > Machine Learning (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.35)

Add feedback

Score Distillation via Reparametrized DDIM

Lukoianov, Artem, Borde, Haitz Sáez de Ocáriz, Greenewald, Kristjan, Guizilini, Vitor Campagnolo, Bagautdinov, Timur, Sitzmann, Vincent, Solomon, Justin

arXiv.org Artificial IntelligenceJun-13-2024

While 2D diffusion models generate realistic, high-detail images, 3D shape generation methods like Score Distillation Sampling (SDS) built on these 2D diffusion models produce cartoon-like, over-smoothed shapes. To help explain this discrepancy, we show that the image guidance used in Score Distillation can be understood as the velocity field of a 2D denoising generative process, up to the choice of a noise term. In particular, after a change of variables, SDS resembles a high-variance version of Denoising Diffusion Implicit Models (DDIM) with a differently-sampled noise term: SDS introduces noise i.i.d. randomly at each step, while DDIM infers it from the previous noise predictions. This excessive variance can lead to over-smoothing and unrealistic outputs. We show that a better noise approximation can be recovered by inverting DDIM in each SDS update step. This modification makes SDS's generative process for 2D images almost identical to DDIM. In 3D, it removes over-smoothing, preserves higher-frequency detail, and brings the generation quality closer to that of 2D samplers. Experimentally, our method achieves better or similar 3D generation quality compared to other state-of-the-art Score Distillation methods, all without training additional neural networks or multi-view supervision, and providing useful insights into relationship between 2D and 3D asset generation with diffusion models.

artificial intelligence, machine learning, score distillation, (16 more...)

arXiv.org Artificial Intelligence

2405.15891

Country:

North America > United States > Massachusetts (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > Singapore (0.04)

Genre:

Research Report (0.64)
Workflow (0.49)

Industry:

Leisure & Entertainment > Sports (0.46)
Automobiles & Trucks > Manufacturer (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Grounded Compositional and Diverse Text-to-3D with Pretrained Multi-View Diffusion Model

Li, Xiaolong, Mo, Jiawei, Wang, Ying, Parameshwara, Chethan, Fei, Xiaohan, Swaminathan, Ashwin, Taylor, CJ, Tu, Zhuowen, Favaro, Paolo, Soatto, Stefano

arXiv.org Artificial IntelligenceApr-28-2024

In this paper, we propose an effective two-stage approach named Grounded-Dreamer to generate 3D assets that can accurately follow complex, compositional text prompts while achieving high fidelity by using a pre-trained multi-view diffusion model. Multi-view diffusion models, such as MVDream, have shown to generate high-fidelity 3D assets using score distillation sampling (SDS). However, applied naively, these methods often fail to comprehend compositional text prompts, and may often entirely omit certain subjects or parts. To address this issue, we first advocate leveraging text-guided 4-view images as the bottleneck in the text-to-3D pipeline. We then introduce an attention refocusing mechanism to encourage text-aligned 4-view image generation, without the necessity to re-train the multi-view diffusion model or craft a high-quality compositional 3D dataset. We further propose a hybrid optimization strategy to encourage synergy between the SDS loss and the sparse RGB reference images. Our method consistently outperforms previous state-of-the-art (SOTA) methods in generating compositional 3D assets, excelling in both quality and accuracy, and enabling diverse 3D from the same text prompt.

artificial intelligence, machine learning, optimization problem, (17 more...)

arXiv.org Artificial Intelligence

2404.18065

Country: Asia (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

Consistent3D: Towards Consistent High-Fidelity Text-to-3D Generation with Deterministic Sampling Prior

Wu, Zike, Zhou, Pan, Yi, Xuanyu, Yuan, Xiaoding, Zhang, Hanwang

arXiv.org Artificial IntelligenceJan-17-2024

Score distillation sampling (SDS) and its variants have greatly boosted the development of text-to-3D generation, but are vulnerable to geometry collapse and poor textures yet. To solve this issue, we first deeply analyze the SDS and find that its distillation sampling process indeed corresponds to the trajectory sampling of a stochastic differential equation (SDE): SDS samples along an SDE trajectory to yield a less noisy sample which then serves as a guidance to optimize a 3D model. However, the randomness in SDE sampling often leads to a diverse and unpredictable sample which is not always less noisy, and thus is not a consistently correct guidance, explaining the vulnerability of SDS. Since for any SDE, there always exists an ordinary differential equation (ODE) whose trajectory sampling can deterministically and consistently converge to the desired target point as the SDE, we propose a novel and effective "Consistent3D" method that explores the ODE deterministic sampling prior for text-to-3D generation. Specifically, at each training iteration, given a rendered image by a 3D model, we first estimate its desired 3D score function by a pre-trained 2D diffusion model, and build an ODE for trajectory sampling. Next, we design a consistency distillation sampling loss which samples along the ODE trajectory to generate two adjacent samples and uses the less noisy sample to guide another more noisy one for distilling the deterministic prior into the 3D model. Experimental results show the efficacy of our Consistent3D in generating high-fidelity and diverse 3D objects and large-scale scenes, as shown in Fig. 1. The codes are available at https://github.com/sail-sg/Consistent3D.

artificial intelligence, consistent3d, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2401.0905

Country:

Europe > United Kingdom > England (0.04)
North America > Cuba > La Habana Province > Havana (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
(2 more...)

Genre: Research Report > New Finding (0.86)

Industry:

Transportation (0.47)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

RichDreamer: A Generalizable Normal-Depth Diffusion Model for Detail Richness in Text-to-3D

Qiu, Lingteng, Chen, Guanying, Gu, Xiaodong, Zuo, Qi, Xu, Mutian, Wu, Yushuang, Yuan, Weihao, Dong, Zilong, Bo, Liefeng, Han, Xiaoguang

arXiv.org Artificial IntelligenceDec-24-2023

Lifting 2D diffusion for 3D generation is a challenging problem due to the lack of geometric prior and the complex entanglement of materials and lighting in natural images. Existing methods have shown promise by first creating the geometry through score-distillation sampling (SDS) applied to rendered surface normals, followed by appearance modeling. However, relying on a 2D RGB diffusion model to optimize surface normals is suboptimal due to the distribution discrepancy between natural images and normals maps, leading to instability in optimization. In this paper, recognizing that the normal and depth information effectively describe scene geometry and be automatically estimated from images, we propose to learn a generalizable Normal-Depth diffusion model for 3D generation. We achieve this by training on the large-scale LAION dataset together with the generalizable image-to-depth and normal prior models. In an attempt to alleviate the mixed illumination effects in the generated materials, we introduce an albedo diffusion model to impose data-driven constraints on the albedo component. Our experiments show that when integrated into existing text-to-3D pipelines, our models significantly enhance the detail richness, achieving state-of-the-art results. Our project page is https://aigc3d.github.io/richdreamer/.

artificial intelligence, diffusion model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2311.16918

Country:

Europe > United Kingdom > England (0.04)
Asia > Japan > Honshū > Chūbu > Nagano Prefecture > Nagano (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.67)

Add feedback

Text-to-3D with Classifier Score Distillation

Yu, Xin, Guo, Yuan-Chen, Li, Yangguang, Liang, Ding, Zhang, Song-Hai, Qi, Xiaojuan

arXiv.org Artificial IntelligenceOct-31-2023

However, it is still challenging and expensive to create a high-quality 3D asset as it requires a high level of expertise. Therefore, automating this process with generative models has become an important problem, which remains challenging due to the scarcity of data and the complexity of 3D representations. Recently, techniques based on Score Distillation Sampling (SDS) (Poole et al., 2022; Lin et al., 2023; Chen et al., 2023; Wang et al., 2023b), also known as Score Jacobian Chaining (SJC) (Wang et al., 2023a), have emerged as a major research direction for text-to-3D generation, as they can produce high-quality and intricate 3D results from diverse text prompts without requiring 3D data for training. The core principle behind SDS is to optimize 3D representations by encouraging their rendered images to move towards high probability density regions conditioned on the text, where the supervision is provided by a pre-trained 2D diffusion model (Ho et al., 2020; Sohl-Dickstein et al., 2015; Rombach et al., 2022; Saharia et al., 2022; Balaji et al., 2022). DreamFusion (Poole et al., 2022) advocates the use of SDS for the optimization of Neural Radiance Fields (NeRF).

diffusion model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2310.19415

Country:

Europe > United Kingdom > England (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > New Finding (0.93)

Industry: Leisure & Entertainment (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

IT3D: Improved Text-to-3D Generation with Explicit View Synthesis

Chen, Yiwen, Zhang, Chi, Yang, Xiaofeng, Cai, Zhongang, Yu, Gang, Yang, Lei, Lin, Guosheng

arXiv.org Artificial IntelligenceAug-22-2023

Recent strides in Text-to-3D techniques have been propelled by distilling knowledge from powerful large text-to-image diffusion models (LDMs). Nonetheless, existing Text-to-3D approaches often grapple with challenges such as over-saturation, inadequate detailing, and unrealistic outputs. This study presents a novel strategy that leverages explicitly synthesized multi-view images to address these issues. Our approach involves the utilization of image-to-image pipelines, empowered by LDMs, to generate posed high-quality images based on the renderings of coarse 3D models. Although the generated images mostly alleviate the aforementioned issues, challenges such as view inconsistency and significant content variance persist due to the inherent generative nature of large diffusion models, posing extensive difficulties in leveraging these images effectively. To overcome this hurdle, we advocate integrating a discriminator alongside a novel Diffusion-GAN dual training strategy to guide the training of 3D models. For the incorporated discriminator, the synthesized multi-view images are considered real data, while the renderings of the optimized 3D models function as fake data. We conduct a comprehensive set of experiments that demonstrate the effectiveness of our method over baseline approaches.

artificial intelligence, dataset, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2308.11473

Country:

Asia > Middle East > Republic of Türkiye > Batman Province > Batman (0.05)
Asia > Japan > Honshū > Chūbu > Nagano Prefecture > Nagano (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
Asia > China (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

Magic3D: High-Resolution Text-to-3D Content Creation

Lin, Chen-Hsuan, Gao, Jun, Tang, Luming, Takikawa, Towaki, Zeng, Xiaohui, Huang, Xun, Kreis, Karsten, Fidler, Sanja, Liu, Ming-Yu, Lin, Tsung-Yi

arXiv.org Artificial IntelligenceMar-25-2023

DreamFusion has recently demonstrated the utility of a pre-trained text-to-image diffusion model to optimize Neural Radiance Fields (NeRF), achieving remarkable text-to-3D synthesis results. However, the method has two inherent limitations: (a) extremely slow optimization of NeRF and (b) low-resolution image space supervision on NeRF, leading to low-quality 3D models with a long processing time. In this paper, we address these limitations by utilizing a two-stage optimization framework. First, we obtain a coarse model using a low-resolution diffusion prior and accelerate with a sparse 3D hash grid structure. Using the coarse representation as the initialization, we further optimize a textured 3D mesh model with an efficient differentiable renderer interacting with a high-resolution latent diffusion model. Our method, dubbed Magic3D, can create high quality 3D mesh models in 40 minutes, which is 2x faster than DreamFusion (reportedly taking 1.5 hours on average), while also achieving higher resolution. User studies show 61.7% raters to prefer our approach over DreamFusion. Together with the image-conditioned generation capabilities, we provide users with new ways to control 3D synthesis, opening up new avenues to various creative applications.

artificial intelligence, machine learning, optimization problem, (18 more...)

arXiv.org Artificial Intelligence

2211.1044

Country:

Europe > United Kingdom > England (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > Japan > Honshū > Chūbu > Nagano Prefecture > Nagano (0.04)

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment (0.67)
Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)

Add feedback